许多涉及某种形式的3D视觉感知的机器人任务极大地受益于对工作环境的完整知识。但是,机器人通常必须应对非结构化的环境,并且由于工作空间有限,混乱或对象自我划分,它们的车载视觉传感器只能提供不完整的信息。近年来,深度学习架构的形状完成架构已开始将牵引力作为从部分视觉数据中推断出完整的3D对象表示的有效手段。然而,大多数现有的最新方法都以体素电网形式提供了固定的输出分辨率,这与神经网络输出阶段的大小严格相关。尽管这足以完成某些任务,例如导航,抓握和操纵的障碍需要更精细的分辨率,并且简单地扩大神经网络输出在计算上是昂贵的。在本文中,我们通过基于隐式3D表示的对象形状完成方法来解决此限制,该方法为每个重建点提供了置信值。作为第二个贡献,我们提出了一种基于梯度的方法,用于在推理时在任意分辨率下有效地采样这种隐式函数。我们通过将重建的形状与地面真理进行比较,并通过在机器人握把管道中部署形状完成算法来实验验证我们的方法。在这两种情况下,我们将结果与最先进的形状完成方法进行了比较。
translated by 谷歌翻译
动作识别是人形机器人与人类互动和合作的基本能力。该应用程序需要设计动作识别系统,以便可以轻松添加新操作,同时识别和忽略未知的动作。近年来,深度学习的方法代表了行动识别问题的主要解决方案。但是,大多数模型通常需要大量的手动标记样品数据集。在这项工作中,我们针对单发的深度学习模型,因为它们只能处理课堂的一个实例。不幸的是,一击模型假设在推理时,识别的动作落入了支持集中,当动作位于支持集外时,它们会失败。几乎没有射击开放式识别(FSOSR)解决方案试图解决该缺陷,但是当前的解决方案仅考虑静态图像而不是图像序列。静态图像仍然不足以区分诸如坐下和站立之类的动作。在本文中,我们提出了一个新颖的模型,该模型通过一个单发模型来解决FSOSR问题,该模型用拒绝未知动作的歧视器增强。该模型对于人体机器人技术中的应用很有用,因为它允许轻松添加新类并确定输入序列是否是系统已知的序列。我们展示了如何以端到端的方式训练整个模型,并进行定量和定性分析。最后,我们提供现实世界中的例子。
translated by 谷歌翻译
执行联合互动需要持续相互监测自己的动作及其对对方行为的影响。这种行动效应的监测受到社会提示的提高,并可能导致越来越多的代理意识。共同行动和联合注意力严格相关,两者都有助于形成精确的时间协调。在人类机器人的互动中,机器人能够与人类伴侣建立共同关注并利用各种社会提示进行反应的能力是创建交流机器人的关键步骤。沿着社会组成部分,可以将有效的人类机器人互动视为改进和使机器人的学习过程更自然和健壮的新方法。在这项工作中,我们使用不同的社交技能,例如相互视线,凝视跟随,言语和人的面部识别,以开发有效的教师学习者场景,适用于动态环境中的视觉对象学习。 ICUB机器人的实验表明,该系统允许机器人通过与人类老师的自然互动来学习新对象,并在存在分心者的情况下学习。
translated by 谷歌翻译
机器人的视觉系统根据应用程序的要求不同:它可能需要高精度或可靠性,受到有限的资源的约束或需要快速适应动态变化的环境。在这项工作中,我们专注于实例分割任务,并对不同的技术进行了全面的研究,这些技术允许在存在新对象或不同域的存在下调整对象分割模型。我们为针对数据流入的机器人应用设计的快速实例细分学习提供了一条管道。它基于在预训练的CNN上利用的混合方法,用于特征提取和基于快速培训的基于内核的分类器。我们还提出了一种培训协议,该协议可以通过在数据采集期间执行特征提取来缩短培训时间。我们在两个机器人数据集上基准了提议的管道,然后将其部署在一个真实的机器人上,即iCub类人体。为了这个目的,我们将方法调整为一个增量设置,在该设置中,机器人在线学习新颖对象。复制实验的代码在GitHub上公开可用。
translated by 谷歌翻译
我们考虑对物体抓住的任务,可以用多种抓握类型的假肢手抓住。在这种情况下,传达预期的抓取类型通常需要高的用户认知负载,可以减少采用共享自主框架。在其中,所谓的眼睛内部系统会根据手腕上的相机的视觉输入自动控制掌握前的手工整形。在本文中,我们提出了一种基于目光的学习方法,用于从RGB序列中进行手部形状分类。与以前的工作不同,我们设计了该系统,以支持以不同的掌握类型掌握每个被认为的对象部分的可能性。为了克服缺乏此类数据并减少对训练系统繁琐的数据收集会话的需求,我们设计了一条呈现手动轨迹合成视觉序列的管道。我们开发了一种传感器的设置,以获取真正的人类握把序列以进行基准测试,并表明,与实际数据相比,使用合成数据集训练的实用案例相比,与对真实数据培训的模型相比,使用合成数据集训练的模型获得了更好的概括性能。我们最终将模型整合到Hannes假肢手中,并显示其实际有效性。我们使代码和数据集公开可用,以复制提出的结果。
translated by 谷歌翻译
在机器人和计算机视觉社区中,6D对象姿态跟踪已被广泛研究。最有前途的解决方案,利用深度神经网络和/或过滤和优化,在标准基准上表现出显着的性能。然而,为了我们的最佳知识,这些尚未对快速的对象动作彻底进行测试。在这种情况下跟踪性能显着降低,特别是对于未实现实时性能并引入不可忽略的延迟的方法。在这项工作中,我们介绍了RGB-D图像流的6D对象姿势和速度跟踪的卡尔曼滤波方法。通过利用实时光流,Roft使低帧速率卷积神经网络的延迟输出与RGB-D输入流的实例分段和6D对象姿态估计实现快速和精确的6D对象姿势和速度跟踪。我们在新引入的照片型数据集中测试我们的方法,Fast-YCB,包括来自YCB模型集的快速移动对象,以及对象的数据集和手动姿势估计HO-3D。结果表明,我们的方法优于6D对象姿势跟踪的最先进方法,同时还提供6D对象速度跟踪。显示实验的视频作为补充材料提供。
translated by 谷歌翻译
We are witnessing a widespread adoption of artificial intelligence in healthcare. However, most of the advancements in deep learning (DL) in this area consider only unimodal data, neglecting other modalities. Their multimodal interpretation necessary for supporting diagnosis, prognosis and treatment decisions. In this work we present a deep architecture, explainable by design, which jointly learns modality reconstructions and sample classifications using tabular and imaging data. The explanation of the decision taken is computed by applying a latent shift that, simulates a counterfactual prediction revealing the features of each modality that contribute the most to the decision and a quantitative score indicating the modality importance. We validate our approach in the context of COVID-19 pandemic using the AIforCOVID dataset, which contains multimodal data for the early identification of patients at risk of severe outcome. The results show that the proposed method provides meaningful explanations without degrading the classification performance.
translated by 谷歌翻译
Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.
translated by 谷歌翻译
Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of regression and inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence. Our approach compares favorably with other alternatives, as confirmed also in numerical simulations.
translated by 谷歌翻译
With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.
translated by 谷歌翻译